Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Automated mechanistic interpretation research has attracted great interest due to its potential to scale explanations of neural network internals to large models. Existing automated circuit discovery work relies on activation patching or its ap- proximations to identify subgraphs in models for specific tasks (circuits). They often suffer from slow runtime, approximation errors, and specific requirements of metrics, such as non-zero gradients. In this work, we introduce contextual decomposition for transformers (CD-T) to build interpretable circuits in large lan- guage models. CD-T can produce circuits at any level of abstraction and is the first to efficiently produce circuits as fine-grained as attention heads at specific sequence positions. CD-T is compatible to all transformer types, and requires no training or manually-crafted examples. CD-T consists of a set of mathematical equations to isolate contribution of model features. Through recursively comput- ing contribution of all nodes in a computational graph of a model using CD-T followed by pruning, we are able to reduce circuit discovery runtime from hours to seconds compared to state-of-the-art baselines. On three standard circuit eval- uation datasets (indirect object identification, greater-than comparisons, and doc- string completion), we demonstrate that CD-T outperforms ACDC and EAP by better recovering the manual circuits with an average of 97% ROC AUC under low runtimes. In addition, we provide evidence that faithfulness of CD-T circuits is not due to random chance by showing our circuits are 80% more faithful than random circuits of up to 60% of the original model size. Finally, we show CD-T circuits are able to perfectly replicate original models’ behavior (faithfulness = 1) using fewer nodes than the baselines for all tasks. Our results underscore the great promise of CD-T for efficient automated mechanistic interpretability, paving the way for new insights into the workings of large language models. All code for using CD-T and reproducing results is made available on Github (https://github.com/adelaidehsu/CD_Circuit).more » « lessFree, publicly-accessible full text available January 22, 2026
-
Abstract It is common to conduct causal inference in matched observational studies by proceeding as though treatment assignments within matched sets are assigned uniformly at random and using this distribution as the basis for inference. This approach ignores observed discrepancies in matched sets that may be consequential for the distribution of treatment, which are succinctly captured by within-set differences in the propensity score. We address this problem via covariate-adaptive randomization inference, which modifies the permutation probabilities to vary with estimated propensity score discrepancies and avoids requirements to exclude matched pairs or model an outcome variable. We show that the test achieves type I error control arbitrarily close to the nominal level when large samples are available for propensity score estimation. We characterize the large-sample behaviour of the new randomization test for a difference-in-means estimator of a constant additive effect. We also show that existing methods of sensitivity analysis generalize effectively to covariate-adaptive randomization inference. Finally, we evaluate the empirical value of combining matching and covariate-adaptive randomization procedures using simulations and analyses of genetic damage among welders and right-heart catheterization in surgical patients.more » « less
-
The screening testing is an effective tool to control the early spread of an infectious disease such as COVID‐19. When the total testing capacity is limited, we aim to optimally allocate testing resources amongncounties. We build a (weighted) commute network on counties, with the weight between two counties a decreasing function of their traffic distance. We introduce a network‐based disease model, in which the number of newly confirmed cases of each county depends on the numbers of hidden cases of all counties on the network. Our proposed testing allocation strategy first uses historical data to learn model parameters and then decides the testing rates for all counties by solving an optimization problem. We apply the method on the commute networks of Massachusetts, USA and Hubei, China, and observe its advantages over testing allocation strategies that ignore the network structure. Our approach can also be extended to study the vaccine allocation problem.more » « less
An official website of the United States government
